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Figure 18: The two- and three-body correlation functions ^2(-^) (top), and ^3(^) (bottom) as determined 
from Eq. 32 (open symbols). They agree very well with the determinations from the moments of the 
counts in cells (filled symbols), but only a much narrower range of scales can be analyzed in such a way. 
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Appendix: Another Approach to 



The correlation functions can be derived from the counts in cells data in a different way from that 
described in the main text. Let us define 

where, for convenience, we will set = and = 1. In the limit iV <C 1, we clearly have 

^ + 0(N) . (32) 

Also note that dZ^/dN = Z^+i- Now we use the convenient fact that Pq = exp Zq, and the derivative 
rule in Eq. 4 relating to Pq to find: 



Pi = -NZiPo , (33a) 
{Z2 + Zf)Po , (336) 



2 

Ps = —^{Zs + 3Z2Zi + Zf)Po , (33c) 

P4 = —{Z4 + 4ZiZ3 + 6ZfZ2 + 3Zl + Z^)Po , (33c?) 

and so on. Using the small N limit above, these can be turned directly into expressions for Thus, 
measuring the P's directly leads to estimates of the ^at's. We have found this process to be very 
unstable for large values of N , but using P2 and P3, we find reasonable measurements of ^2 and ^3. 
Fig. 18 shows this comparison: the solid squares connected by solid lines are the average correlation 
functions, taken from Fig. 3 and 5 above. The open circles connected by dashed lines are derived from 
P2 and P3 using the volume-limited sample to 7600 km s~-^ and Eq. 336 and 33c, respectively. For this 
subsample, N < 0.1 for the range of values of I for which is derived from this formalism, thus the 
expansion above is valid. The agreement is good, but because Eqs. 16 - 18 are exact, while Eq. 32 is 
an approximation, the former are more robust. Moreover, one gets positive results for the over only 
a narrow range of scales, as Fig. 18 shows. Thus one can determine values of ^4 from P4 at only two 
scales! Thus this method is inferior to the determination of the from moments. 
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as the CfA survey (cf., Alimi et al. 1990), but in a much larger volume. Finally, Saunders et al. (1992) 
are in the process of obtaining redshifts for all IRAS galaxies to the flux limit of the IRAS Point Source 
Catalog, which should result in an appreciably denser sample than the one used in the present analysis. 
Thus we can look forward in a few years to testing if the scale-invariant forms, which seem to hold so 
well in iV-body models, adequately describe the real universe. 
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Foundation. MD's research is supported in part by NASA grant NAG5-1360 and NSF grant AST- 
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it is not clear if IRAS galaxies or optically selected galaxies are a more faithful tracer of the underlying 
mass distribution. Finally, the boosting is done in a rather ad-hoc way, and is unlikely to reproduce all 
the features of counts in cells done from optically selected samples of galaxies; comparable analyses on 
large optically selected samples are needed to explore this further (cf., Gaztahaga 1992). 

The astonishingly good agreement found here between the observed scalings, and those predicted 
by second-order perturbation theory given Gaussian initial conditions, appears to be a strong confir- 
mation of the random-phase hypothesis. However, we have not explicitly tested non-Gaussian models; 
Luo & Schramm (1993) argue that non-linear evolution of initially non-Gaussian models make them 
appear remarkably like their Gaussian counterparts in the scalings of the various moments of the counts 
in cells. iV-body simulations of non-Gaussian models, along the lines of Weinberg & Cole (1992) are 
needed to tell if the observed constancy of ^3 and ^4 can be used to rule out classes of such models (see 
Coles et al. 1993 for preliminary investigations along these lines). It is remarkable that the constancy of 
53 predicted from second-order perturbation theory holds true even in the non-linear regime (^2 ~ 10). 
On the other hand, this is exactly what was found in iV-body simulations of various cosmological sce- 
narios by Bouchet et al. (1991) and Bouchet & Hernquist (1992), results that have been confirmed and 
extended by Lucchin et al. (1993) and Weinberg (private communication). However, an explanation 
based on dynamics continues to elude us. Moreover, Colombi et al. (1993a) find constancy of the Sn 
in their iV-body simulations only after finite volume corrections, if the simulation volume is too small; 
since we find the Sn to be independent of scale without any such corrections, we may indeed have 
sampled volumes large enough not to be grossly affected at the scales we probed. 

Szalay (1988) shows that if the biasing scheme is non-linear (that is, if there is a non-linear relation 
between the mass and galaxy density fields), then one generically expects a relation between the two 
and three-point correlation functions involving a cubic term, leading to a a dependence of ^3 on ^2- 
However, this is a statement about initial conditions: non-linear evolution can quickly erase this cubic 
term (Gott, Bin, & Park 1991), and thus we cannot put constraints on the non-linearity of the biasing 
scheme. Moreover, Fry & Gaztahaga (1993) show that the scaling relations are preserved for local 
non-linear biasing schemes. 

We found that the void probability function scales according to the prediction of the scale-invariant 
model, in which all Sn are constant. This scaling is highly non-trivial, since we are probing the non- 
perturbative regime in which correlations of all orders are important. However, we only probe the 
dilute regime, for which the number of clustered particles above the mean is not much larger than unity. 
This limitation is due to the sparse sampling of the galaxy distribution by IRAS galaxies, and prevents 
study of further predictions of the scale invariant models, or discrimination between various competing 
models. Thus sparse sampling strategies (Kaiser 1986), while optimal for determining ^2 on large scales, 
are not appropriate for determining higher-order moments of the galaxy distribution. It also precludes 
using the overall shape of the counts to distinguish between the various existing theoretical models. 
Discrepancies are seen between the counts in cells and various such models in the densest subsample 
probed, but finite volume effects do not allow us to conclude firmly that these models do not fit the 
data. 

What are the prospects for the future? It would be fascinating to extend the analysis discussed here 
to the highly non-linear regime, which will require densely sampled surveys of galaxies. Three surveys 
in progress come to mind. The Center for Astrophysics survey to rriB = 15.5 is nearing completion 
(cf., Geller & Huchra 1989), and certain aspects of the void distribution have already been discussed 
(Vogeley, Geller, & Huchra 1991). Davis et al. (1992) are finishing a magnitude-limited redshift survey 
of optically selected galaxies covering 65% of the sky, which will probe the densely sampled limit as well 



31 



the case of numerical simulations (BSD; Bouchet & Hernquist 1992; but also see Suto, Itoh, & Inagaki 
1990) this distribution provides quite a good fit to the data. The Log-Normal model (Eq. 27, short 
dashes) also fits the data well, as does the negative binomial model (Eq. 23, long dashes), although none 
properly match the steep drop-off of the P^r at large N in the densest sub-sample, with the log-normal 
distribution behaving the most poorly. However, this sub-sample is of course of the smallest volume, 
and finite volume effects are far from negligible (Colombi et al. 1993a). Thus, as in Fig. 15, we cannot 
use the discrepancy between the and the models to rule out the latter. In larger volumes, the 
sparseness of the samples causes all models to become degenerate. This degeneracy is broken only in 
the regime in which the number of clustered points Nc is much larger than unity for scales smaller than 
the correlation length (Colombi et al. 1993b). 

5. Conclusions 

We have measured the count probability distribution in a series of 10 volume-limited subsamples, 
each roughly twice as big in volume as the previous one. We saw that, once the Poisson contribution is 
subtracted, the variance of this distribution, is well fit by a single power law of index 7 = —1.59 over 2 
decades of scale [i.e., for a cell radius 0.5/i~"^Mpc < I < 50/i~"^Mpc). We have found a weak dependence 
of the small-scale correlation strength on IRAS luminosity, which appears at all orders we investigated. 
We derived the higher-order correlation functions, to ^5, and found them to obey power-laws with 
slopes given approximately by (iV — 1)7. We also found that the skewness ^3 is closely approximated 
by ^3 oc ^^■^■^^^^■^^ over the range 0.1 < ^2 < 10. A similar regression on the kurtosis of the distribution 

2 03i0 18 

yields ^4 oc ^2 over essentially the same range in scales. Thus the data are consistent with the 

skewness and kurtosis being simply proportional to the square and the cube of the variance, respectively, 
both in the weakly and strongly non-linear regimes. Following our preliminary announcement of this 
result (Bouchet et al. 1992a), Gaztahaga (1992) analyzed the CfA and SSRS redshift surveys using 
similar techniques, and found very similar results. We looked directly at the ratio ^3 = ^3/^2^, and 
found that it varies only weakly, if at all, with scale; there is no theoretical reason to expect this to hold 
from weakly to strongly non-linear scales (although iV-body simulations show similar behavior, Bouchet 
& Hernquist 1992; Suto 1993; Lucchin et al. 1993; D. Weinberg, private communication). The data are 
consistent with ^3 = 1.5 ±0.5. This value is smaller than that inferred from small scale measurements on 
optically selected samples, which probably reflects the weaker sampling of dense cluster cores by IRAS 
galaxies than optically selected galaxies. If linear biasing is assumed, the value of the biasing parameter 
is constrained between 1.6 and 3.2 (one sigma), for a power spectrum index n = —1.4, but the linear 
biasing model is probably not applicable when looking at statistics which measure the asymmetry of the 
density distribution. A similar analysis at the next order gives ^4 = 4.4 ±3.7; again, the dependence on 
scale is at most rather weak. Lahav et al. (1993) suggest that the constancy of ^3 with scale is partially 
due to redshift space distortions; Matsubara & Suto (1993) indeed find that ^3 grows with variance in 
analyses of iV-body simulations evaluated in real space, but that in redshift space, it is much closer to 
constant. 

We have tested the robustness of these results to the treatment of the clusters. IRAS galaxies are 
known to give systematically lower estimates of the density of cluster cores than do optically selected 
galaxies; "correction" for this effect greatly increases the observed strength of the correlations, and 
causes ^3 and ^4 to increase somewhat as a function of ^2 in the transition between the weakly and the 
strongly non-linear regime, rather than staying constant. The boosting affects higher-order correlations 
more than those of lower-order, which means than the value of ^3 derived implies a lower value of the bias 
parameter than that found without biasing, 0.70 < 6 < 1.18. However, this non-linear transformation 
of the galaxy density field introduces curvature terms into the relation between ^3 and bias. Moreover, 
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Logio N 

Figure 17: Same as Fig. 1, but the observed PN{i) are now compared with various theoretical models. 
The dots correspond to the distribution predicted by Saslaw once the variance is adjusted to equal that 
of the data. Short-dashes show a log-normal distribution, while long-dashes show a negative binomial 
distribution. 
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Figure 15: The quantity a plotted against Nc for the densest subsample of Fig. 14. Also plotted are 
the hierarchical Poisson model of Fry (1986) (short dashes), the negative binomial model of Gaztahaga 
& Yokohama (1993) (long dashes), the thermodynamic model of Saslaw & Hamilton (1984) (long dot- 
dashes) and the phenomenological fit of Alimi et al. (1990) with oj = 0.9 (short dot-dashes). 




-0.5 0.5 1 

Logio t 

Figure 16: Variation of characteristic particle numbers, Nc{l) (triangles), N^i^l) (stars), and N (squares), 
in the densest (2400 km s~-^) sub-sample. We used oj = 0.9 to derive iV„. A horizontal line is drawn at 
iV = 1. 
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and this model is also consistent with the scale-invariant hypothesis. 

Coles & Jones (1991) hypothesize that the density field is given by a log-normal distribution, 
which implies that the counts in cells are given by: 



1 1 r°° 
= I ± / A^-^e-^exp 
2 7r 1/2(7 iV! Jo 



(log A - log af 
2(t2 



(27r) 

where 

a = = — and a = flogf 1 + ^o)) 

(1 + 6)'/' I fei 



N ^.-1/2 



d\ , (27) 



(28) 



The expression a = — log Pq/N is manifestly not a function of Nc alone, and thus this model is explicitly 
non-scale-invariant. 

Finally, two models have been proposed for the Pq alone. Fry (1986) has developed a hierarchical 
Poisson model in which 

a = (l-e-^=)/iV, , (29) 

and Alimi et al. (1990) use a phenomenological fit to the quantity a that asymptotes to a power-law at 
large N^. 

a = {1 + . (30) 

Fig. 15 shows the a — Nc comparison for the 2400 km subsample, together with four models: the 
hierarchical Poisson model (Eq. 29, short dashes), the negative binomial model (Eq. 24, long dashes), 
the thermodynamic model (Eq. 26, long dot-dashes), and the phenomenological fit of Eq. 30, with 
oj = 0.9 (short dot-dashes). The log-normal model, Eq. 27, is not included in this figure, as it fails to 
predict the scale-invariance of a. The best fit is that of Eq. 30, but of course, it is the only one with a 
free parameter. Moreover, the largest discrepancies between the models and the data occur for values of 
Nc which only the sample volume-limited to 2400 km probes (compare Fig. 15 to Fig. 14). In fact, 
as typical errors in log^g f are 0.05, the discrepancy between the data and the hierarchical Poisson and 
negative binomial models is only at the 1 a level. Incidentally, Alimi et al. (1990) found a best-fit value 
oi u) = 0.5 ± 0.15 from the CfA data they analyzed, but they were able to fit to the region within which 
a asymptotes to a power-law of Nc, which IRAS does not probe, thus the difference between these two 
values is not very significant. 

As discussed in the Introduction, Balian & SchaefFer (1989a) show that the scale-invariant hy- 
pothesis implies that the P^ follow various specific scaling laws in various regimes. These scaling laws 
become manifest in the limit of high sampling, and provide the most powerful and direct tests of scale 
invariance. Unfortunately, the IRAS sample is simply too sparse to allow us to test these forms. 

This can be illustrated as follows. The scaling laws are determined by three characteristic numbers, 
N(^l), the average number of galaxies in a volume, Nc{l), which as we saw characterizes the number 
of clustered particles in the volume, and N^t^l) = (log Po(^))"^/("^~'^), which is equal to unity on scales 
I of the characteristic sizes of voids. Interesting scaling laws appear in the limit 1 <C N^ <^ N <^ Nc- 
Fig. 16 shows these three characteristic numbers as a function of scale for the 2400 km subsample. 
The region of parameter space in which this limit is satisfied is vanishingly small, and the situation is 
of course worse for the larger and sparser subsamples. 

With this limited ability to distinguish models in mind, we now turn to direct comparisons of 
the observed P^ with models. Fig. 17 compares the measured P^ of Fig. 1 with various models. The 
dots show the distribution predicted by the thermodynamic model of Saslaw (Eq. 25). Contrary to 
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^2\L sample ) I j (22) 



in C7 in a volume of size L sample- 



.Ntot 

where Ntot = ^ {^'^ l'^)I'^sample total number of galaxies in the volume, and i2{Lsample) is the 

sample-to-sample variance. Eq. (22) is valid for values oi I such that there is less than one void of size I 
in the volume; that is, Pq {^t^ l'^)L^sample^~^ ~ 1, a condition violated for the point of smallest I (largest 
Nc) for each subsample in Fig. 14. Eq. (22) estimates the true error in ct by a large amount when ct <C 1. 
Of course, the abscissa in Fig. 14 is also subject to error; an estimate of the fractional uncertainty in 
Nc is {S i 1 2(^2{L sample)Y ^ ■ For the five smallest volumes, the shot noise contribution to the error in a 
in Eq. (22) is negligible, and with ^4 » 4 we find: Act/ct ~ ANc/Nc ^ 6^^^ ~ 0.12, 0.08, 0.07, 0.06, 
and 0.05 respectively with increasing sample volume. Note that the ordinate in Fig. 14 is log;^o'''i thus 
a fractional error in a of 0.12 corresponds to an error bar in the figure of 0.05. 

It is important to note that the regime tested corresponds to relatively small values of Nc{l), 
i.e., Nc{l) ^ 1, and we are thus only probing the dilute regime (ct ~ 1), where departures from Poisson 
are rather small. Still, if all correlations beyond second order are negligible, then Eq. 1 shows that 
a = 1 — Nc/2 (short dashes in the figure); it is clearly a bad fit. If we can ignore terms beyond third 
order, then a = 1 — Nc/2 + SjNc /6, which is plotted as the long dashes in the figure, using ^3 = 1.5, as 
we found above. This also is a poor fit, indicating that higher-order correlations are not negligible. This 
clearly shows that, when Nc ~ 1, one probes the non-perturbative regime where aZZ orders contribute. 

Nevertheless, the various scale-invariant models that have been proposed for the specific ensemble 
of values for the Sn differ substantially from one another only in the regime Nc ^ 1, and become 
degenerate in the sparse sampling limit (Colombi et al. 1993a). Balian & Schaeffer (1989a) argue that 
scale-invariance implies that a oc Nc'^ for Nc ^ 1, where w is a constant which characterizes the non- 
linear clustering of the sample. Fig. 14 shows only marginal evidence that a asymptotically approaches a 
power-law; only the densest sample we have, volume-limited to 2400 km s~-^, probes the regime iVc > 1. 

Various models have been proposed for the clustering hierarchy, which we are now in a position to 
test directly. Among them are the negative binomial model (Carruthers & Shih 1983) which provides a 
good fit to the CfA data (Gaztahaga & Yokohama 1993): 



. N-l 
1 -ttN, ^ 



N\ 

In this model. 



PAr = ^iV^(l + iV6)-^-'/^= n(l+j'6) • (23) 



a = - log Po/N = log(l + Nc)/Nc (24) 

is a function of Nc alone, making it consistent with the scale-invariant hypothesis. Saslaw & Hamilton 
(1984, see also Saslaw 1985; 1989) have used thermodynamic arguments to propose that the counts in 
cells are given by 

Pn = ' [N{1 -b) + Nb\ exp [-iV(l - b) - Nb\ , (25) 

where 6=1 — 1/(1 + NcY^'^, if the model is to have the same variance as the data (Fry 1986). In this 
case, 

a = l/{l + NcY^' , (26) 
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Figure 14: Departures from Poisson statistics a = — logPo/iV, plotted in the top panel as a function 
of the cell sizes I : from bottom to top the curves correspond to volume-limited samples of increasing 
volume. In the bottom panel, a is expressed as a function of Nc{l). The short dashed curve is the 
prediction a = 1 — Nc/2 expected if correlations higher than second order are negligible, while the long 
dashed curve is the prediction a = 1 — Nc/2 + SjN^ /6 expected if correlations higher than third order 
are negligible. 
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Figure 13: All determinations of ^3 = ^3/^2 from the boosted counts are presented in the top panel. 
The bottom panel shows an equal weight average in bins of values of log;^o ^2 and the average value 
(dashes). 
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3.3. The effect of boosting clusters 

Strauss et al. (1992a) show that IRAS galaxies give systematically lower density estimates in the 
cores of clusters than do optically selected galaxies. The higher-order correlation functions depend 
sensitively on the tails of the density distribution, which motivates us to test the sensitivity of our 
results to the cluster densities. Strauss et al. (1992a), in their Table 2, derive density estimates in 
the central 100 km s~-^ of the cores of seven nearby clusters; we have performed counts on the IRAS 
sample, in which galaxies associated with the clusters are given extra weight corresponding to the ratio 
of the optical and IRAS density estimates. This affects a total of only 126 galaxies (not all of which 
enter into one of the volume-limited subsamples). Nevertheless, the results are dramatic. As expected, 
all correlations are strengthened, especially for larger N . We first show the relations between the 
in Fig. 12. The fit to a line is even better than it was before, and extends over a greater range of 
variance (note the difference in scales from Fig. 8). The least-square fits to the points are shown in 
the figure, and give C3 = 0.56 ± 0.02, D3 = 2.06 ± 0.01, and C4 = 1.30 ± 0.04, D4 = 3.18 ± 0.03. 
The slopes here are now slightly steeper than before, and are no longer consistent, within the errors, 
with the scale-invariant prediction. This is reflected in the derived values of ^3, shown in Fig. 13. The 
relative constancy of ^3 seen in Fig. 10 is no longer seen; in particular, ^3 is an increasing function 
of ^2- In addition, the average value of ^3 is appreciably higher in the present case. The lower panel 
shows an equal weight average in bins of logio^2- The dashed line is the average of the points with 
0.1 < ^2 < 10, namely 3.71 ± 0.95, which is more than twice the value we found for the unboosted 
case (cf.. Fig. 10). This emphasizes the danger of deriving a value of the bias b from these data; by 
double- counting ~ 2% of the galaxies, we decrease the bias estimate by more than a factor of two, 
to 0.70 < 6 < 1.18. Note that boosting clusters (and thus increasing the correlations) causes the bias 
estimate to go down; this is because the three-point correlation function is affected even more than the 
two-point correlation function. Of course, this is a very non-linear transformation of the density field; 
as we mentioned above. Fry & Gaztahaga (1993) and Juszkiewicz et al. (1993b) show that non-linear 
transformations add extra terms to Eq. 21. Not surprisingly, ^4 is changed even more than ^3 by the 
boosting; we find ^4 = 23.6 ± 12.1 for the range 0.1 < ^2 < 10 (compare with the value 4.4 ± 3.7 found 
above), with even a more dramatic rise in the densest regions. 

4. Void probability Pq 

The counts in cells also provide measurements of the void probability function Pq. As discussed 
in the Introduction, the deviations of Pq from its Poisson value can be quantified by the function a; 
under the scale-invariance hypothesis, a depends only on Nc (Eqs. 1 and 7). The behavior of a(^n,v) 
thus provides us with an indirect way of testing the scaling of high order moments with variance. 

The top panel of Fig. 14 shows the measurements of ct = — logPo/iV versus I in our series of 
volume-limited samples (which of course are of varying number density), while the bottom panel shows 
the same measurements as a function of the number of clustered particles above the mean Nc{l). It 
clearly demonstrates that a behaves according to the prediction of scale-invariant models to an amaz- 
ing degree of accuracy. The agreement between the determinations in different (nearly independent) 
subsamples also suggests that the error bars in each of them are rather small, typically of the size of 
the plotting symbols themselves, for all but the point of largest Nc in each subsample. 

How might we make a quantitative assessment of the errors in ct? By comparing numerical 
simulation results and analytical calculations, Colombi et al. (1993b) obtain an upper limit for the error 
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Log 10 

Figure 11: All determinations of ^4 = ^4/^2'^ in our series of volume-limited sub-samples are presented 
in the top panel. The bottom panel shows an equal weight average in bins of values of log;^o^2) as well 
as the average value ^4 = 4.4 (dashes). 
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skewness measures the left-right asymmetry of the distribution, i.e., the degree of asymmetry between 
overdense and underdense regions. The lower value of ^3 for IRAS galaxies than for optically selected 
galaxies just reflects the under-representation of IRAS galaxies in dense cluster cores. 

Juszkiewicz et al. (1993a) use perturbation theory to calculate the value of ^3 resulting from 
number counts in spherical cells, and find ^3 = 34/7 — (tc + 3), where n is the index of the power 
spectrum {P{k) oc A;"). The solid line on the left of the figure indicates the value expected for n = —1.4 
(and 6=1, see below), which is appropriate for the scales probed by our measurements (Fisher et al. 
1993a). Similar predictions for ^3 can be made in the case of a Gaussian smoothing appropriate to the 
QDOT measurements (cf., Juszkiewicz et al. 1993a). 

As the results for the skewness depend on the biasing model relating the underlying matter density 
field 6m to that traced by the galaxies 6, our results put constraints on the model. For simple linear 
biasing in which 



where S3M is the value appropriate for the dark matter, as given by the horizontal line on the left of 
the figure. Thus, for n = —1.4, our determination of ^3 from the data yields 1.6 < 6 < 3.2 (1 a). 
While this is the range of values currently in fashion for the bias parameter (e.g., Weinberg 1989), this 
result should be taken with a large grain of salt; the value of the bias parameter refers to a model in 
which positive and negative densities are treated in equivalent ways (Eq. 20), while the derivation of b 
above is based on the asymmetry in the density field. Even more damning is the sensitivity of ^3 to 
the treatment of the clusters of galaxies. We will see in §3.3 that the derived value of b by Eq. (21) 
when clusters are given extra weight to match the overdensities seen in optically selected samples of 
galaxies (Strauss et al. 1992a), is less than half the value derived here, and thus this estimate of b has 
a systematic uncertainty of at least a factor of two. Finally, any non-linear term in the biasing scheme 
modifies the relation between ^3 and 5*3^ given in Eq. 21, although it does preserve the constancy of 
S3 in the weakly non-linear regime (Fry & Gaztahaga 1993; Juszkiewicz et al. 1993b). 

Fig. 11 is the exact analog of Fig. 10 for the fourth central moment (kurtosis) of the counts 
distribution. The data are given in Table 2. Although the dispersion is much larger than in the 
skewness case, we do find that ^4 is approximately constant (within the error bars) over the whole 
range accessible. If we assume that ^4 is a constant, we find ^4 = 4.4 ± 3.7. Even though this value is 
rather ill-determined, and one cannot exclude the possibility of a substantial trend of ^4 with scale (or 
variance), one must bear in mind the 6 orders of magnitude spanned by ^4 and ^2'^- 

We stop here at the fourth moment. However, we saw in Fig. 5 above that we were able to derive 
the fifth moment from the data, and that qualitatively at least it obeyed the scaling laws described in 
the Introduction. However, the slopes of the best-fit lines to the second-order and fifth order correlation 
functions are not in perfect agreement, and we saw above that noise in the determination of ^3 and ^4 
did not allow us to conclude definitively that they were independent of scale. These problems only get 
worse at higher moments (cf.. Table 2 and Fig. 9). Moreover, as discussed at the beginning of § 4, the 
higher-order moments become increasingly sensitive to the very rare peaks which may be completely 
missed in a finite volume. Fortunately, there are ways to check whether scale invariance holds at high 
orders which do not rely on the direct determination of the correlation functions, which we explore in 



6 = b 6m 



(20) 



we have 
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Figure 10: All determinations of ^3 = ^3/^2' in our series of volume-limited sub-samples are presented 
in the top panel. The bottom panel shows an equal weight average in bins of values of log;io^2) the 
average value (dashes), as well as the values inferred from measurements of Q in the non-linear regime 
from optical data (triangles), from measurements of skewness and variance on the QDOT sample (error 
bars on left) and the theoretical prediction (solid line on left) from perturbation theory for a power 
spectrum of index n = —1.4 and no bias (6 = 1). 
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Figure 9: Results of least-squares fits of power laws to the correlation functions as a function of 
for each subsample separately. Thus the data are fit to the form l/(iV — 1) log;^o = Cn + Dn$2- The 
results for different N are given different symbols, and are staggered slightly in the ordinate to avoid 
overlapping error bars. These are the data tabulated in Table 2. In the lower panel, the dashed lines 
indicate the predictions of the scale-invariant hypothesis: D3 = 2, D4 = 3, and D5 = 4. 
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the points in the figures, log;^o^A'^ = + -^^at logio ^2) yield C3 = 0.15 ± 0.05 and = 1.96 ± 0.06, 
and C4 = 0.46 ± 0.09 and D4 = 3.03 ± 0.18 (the point of lowest ^2 was not taken into account at the 
third order). One worry is the luminosity dependence of the strength of the correlations on small scales 
which we saw above. Table 2 and Fig. 9 show the results of least-square fits to the — $2 data for 
each subsample independently; no systematic effects are seen. That is, within the noise, any luminosity 
effects in the correlations are such as to move points along the lines in Fig. 8, not perpendicular to 
them. In other words, the fact that galaxies of different luminosities show slightly different levels of bias 
does not grossly affect the relationships between the moments. The horizontal lines in the lower panel 
of the figure show the scale-invariant predictions; the fits to the individual subsamples indeed do cluster 
around them, although the scatter is of course worse as N increases. Irrespective of any theoretical 
prejudice, the data does indicate rather unequivocally that the skewness and the kurtosis are indeed 
proportional to (respectively) the square and the cube of the variance over the whole range accessible. 
Interestingly, the derived do not agree as well with the scale-invariance predictions in the collapsed 
clusters case (cf.. Fig. 7), and the discrepancy increases with order. 

Given the previous results, we attempt to go one step further and plot in Fig. 10 the ratio 
5*3 = ^3/^2^ versus the variance. The upper panel shows the raw determinations for each subsample; 
average values are given in Table 2. In the lower panel, the open squares show the mean value of ^3 
in bins of values of log;^o^2; the plotted error bars correspond to one standard deviation in each bin. 
There is a weak trend of increasing ^3 with variance, but the data are also compatible with a constant 
value of S3, even from the quasi-linear to the highly non-linear regimes. The average over all values is 
S3 = 1.5 ± 0.5, which is indicated by the dashed line. This error is larger than would be expected from 
the linear fit to points in Fig. 8, both because of the difference between linear and logarithmic weighting, 
and because two parameters are fit to the data in Fig. 8, while only one is fit here. Note that Table 
2 shows that a least-square fit to the correlation function gives different slopes for $2{i) and {$3{i-)Y^'^, 
implying that ^3 cannot be perfectly constant. However, the data are noisy, and this difference in slope 
(and thus the variation in ^3) is not significant; after all, we found that D3 was consistent with the 
scale-invariant prediction of 2.0. 

The three error bars in the weakly non-linear regime show the ratio of the skewness and variance 
squared measured in Gaussian windows in the QDOT sample (Saunders et al. 1991; cf.. Coles & Frenk 
1991). Since these latter authors did not look directly at ^3, we have taken the values they quote for 
the variance and the skewness and used the standard propagation of errors. 

There is another approach to the measurement of ^3: at small scales, it is observed that the 
three-point correlation function may be expressed as a symmetrized sum of double products of two- 
point correlation functions times a constant called Q (Peebles 1980). It then follows that ^3 = 3(5^3^2^) 
where J3 is given by 

J3 = (3/47r)3 /' c?3xic?3x2C?3x3(||xi - X2II ||X2 - X3||)-^ . (19) 
Jo 

Thus S3 = 3(5^3/^2^) where J2 is given by Eq. 10. For the IRAS galaxies, we set 7 = 1.59 (see Table 
2) and find J3 = 2.69 and thus ^3 = 3.17 Q, yielding Q = 0.49 ± 0.16. Alternatively, for 7 = 1.77 
appropriate for optically selected galaxies, we find J3 = 3.69, and so ^3 = 3.34 Q. The triangles on the 
right of the figure correspond to the values oi Q = 1.29±0.21 (Groth & Peebles 1977), and Q = 0.8±0.07 
(Peebles 1980). The latter value, which yields a value of ^3 consistent within 1.5 ct with that determined 
from the present sample, was obtained by analyzing the galaxy catalog of Davis, Geller, & Huchra (1978) 
which was chosen to contain no prominent clusters, and is in agreement with the value measured from 
the angular distribution oi IRAS galaxies by Meiksin et al. (1992). This is the expected trend, since the 
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on the closed circles alone. After scaling by (iV — 1), the iV-point correlation functions are in very 
close agreement, as the scale-invariant hypothesis would predict (Eq. 7). However, the higher-order 
correlations are noticeably steeper than are the lower orders, which is in the sense expected from finite 
volume effects (Colombi et al. 1993a). We will quantify the comparison between the different orders 
in the next section. Collapsing clusters increases all the correlations on small scales, as is expected, 
and the effect is stronger for the higher-order correlations. Thus the higher-order correlation functions 
are steeper in the cluster-collapsed case than when clusters are not collapsed. Reality lies somewhere 
between these two extremes; the Finger of God in a cluster dilutes the true clustering on small scales, 
while collapsing all cluster galaxies to a single point exaggerates it. However, the correlations on scales 
above 3h~^ Mpc are insensitive to this, giving us greater faith in the large-scale correlations seen, even 
at fifth order. 



3.2. Relations between the correlation functions 
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Figure 8: Determinations of vs. ^2 in our series of volume-limited sub-samples (iV 
and iV = 4, bottom panel). The dashed lines show least squares fits to the points. 



3, top panel. 



We just saw that the correlation functions are well-described by power laws, and the least square 
fits we made suggest that the high order can be expressed simply as powers of ^2- Fig. 8 checks 
this directly by plotting ^3 and ^4 against ^2- The relations are remarkably tight. Least square fits to 
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Figure 7: The 2, 3, 4, and 5-point correlation functions, averaged over subsamples, for the cases in 
which clusters are not collapsed (the default in this paper; closed symbols) and when they are collapsed 
(open symbols). The collapsing process boosts the correlations on small scales. 
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small scales is larger for more luminous galaxies. At large scale, the correlations steepen, as expected 
from finite volume effects. The determination of ^5 is quite noisy, with little dynamic range in any one 
subsample. Nevertheless, we see that it is positive even at rather large scales. 
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Figure 6: Results of least-squares fits of power laws to the correlation functions as a function of the 
subsample size, for each subsample separately. Thus the data are fit to the form l/(iV — l)log;^o^A'^ = 
+ 5^1og;^o-^- The results for different N are given different symbols, and are staggered slightly in 
the ordinate to avoid overlapping error bars. These are the data tabulated in Table 2. 



These systematic effects are quantified in Fig. 6, which shows the variations of the best fit coeffi- 
cients and to the form l/(iV — 1) log;^o = ^at + -Bat logio for N = 2,3,4, and 5, as a function 
of the size of the subsample. These are the data tabulated in Table 2. The quantity A^ is the logarithm 
of the correlation function at i = lh~^ Mpc. This figure shows that there is an increase of clustering 
strength on small scales with luminosity for all correlations examined, as we saw directly from Figs. 2 
and 5; for sample sizes above 100h~^ Mpc, the data are too noisy to check if the trends continue. There 
is also a trend that the more luminous galaxies show a steeper slope. In the next section, we will show 
that these luminosity effects do not affect the comparison of the higher-order ^at's with ^2- 

Fig. 7 shows the correlation function averaged over subsamples of Fig. 5, both with clusters 
collapsed (open circles) and without, which has been our default (closed circles). First let us concentrate 
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Figure 5: Raw determinations of the iV-body correlation function, $n^^^^ ^\^): 10 volume- 

limited subsamples. The top panel corresponds to iV = 3, the middle panel to iV = 4, and the bottom 
panel to iV = 5. The higher-order correlations are not determined from the sparsest subsamples, which 
is why fewer than ten curves are apparent in these plots. 
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Figure 4: Distribution of deviations, summed over all scales, of the raw determinations of ^2(-^) (plotted 
in Fig. 2) from our least squares fit to it (short dashes of Fig. 3). Also plotted for comparison is a 
Gaussian distribution (dots) with the same standard deviation as that of the histogram, a = 0.33. 

while the pair counts correlation function become more and more shallow at smaller separations. The 
Fisher et al. correlation function is derived from the full IRAS sample with optimal weighting, which 
means that the correlation function at small scales is dominated by counts from nearby and therefore 
low-luminosity galaxies, while our equal weight averaging gives more weight to more luminous galaxies. 
The lower luminosity galaxies show weaker clustering, as we saw above, which causes the turnover of 
the correlation function on small scales. Indeed, the triangles in Fig. 3 are in perfect agreement with ^2 
determined from the volume-limited sample to 2400 km s~-^, shown as open circles. 

Fisher et al. (1993a; 1993b) show that the power spectrum and correlation function of IRAS 
galaxies are both in agreement with the angular correlation function derived from the APM survey 
(Maddox et al. 1990). The agreement on large scales between the Fisher et al. (1993b) volume- averaged 
correlation function, and that derived in this paper, thus implies an agreement between the present 
results and those of the APM. 

Efstathiou et al. (1990) measured the variance in counts in cubical cells, ^2^^) in a sparser and deeper 
redshift survey (Rowan- Robinson et al. 1990; hereafter QDOT) of IRAS galaxies. In order to compare 
their measurements with ours, we used the relation given by Saunders et al. (1991), ^2(0.63 £) ~ ^2^{l), 
which holds both for white noise and for a Gaussian random field with a power law correlation function 
of index —1.6. Their results at large scale are shown as stars in the figure. Although they are consistent 
with our results, they are on the high side. It seems from their Fig. 1 that this might be attributed to 
counts in a single shell; thus the discrepancy is associated with just a few cells. Fisher et al. (1993a) 
come to a similar conclusion through a power spectrum analysis of the present data set. 

— 1/2 

We now turn to higher order correlation functions, shown in Fig. 5. The top panel shows ^3 , 
the middle panel shows ^4^^'^, and the bottom panel shows ^5^^^; we take roots of the data with the 
scale-invariant predictions in mind (Eq. 7). The results from all subsamples are shown, but in practice, 
the higher-order correlations are not determined from the sparsest sub-samples. The results of least 
square fits to the data are summarized in Table 2. Again, we see systematic effects as a function of 
sample volume, somewhat more pronounced than for ^2- In particular, the correlation strength on 
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Figure 3: Average two-body correlation function ^2(-^) obtained by making an equal weight average 
(open squares) of the raw determinations of Fig. 2. The solid triangles show a numerical integration 
(Eq. 2) of the best direct determination of ^2 obtained by Fisher et al. (1993b). The open circles show 
the correlation function ^2 derived from the 2400 km s~-^ subsample alone; it is in good agreement with 
the triangles, and the discrepancy between the circles and squares is a reflection of the variation in the 
strength of the clustering as a function of luminosity. Also plotted (stars) are the (scaled) results from 
the measurements in the QDOT sample by Efstathiou et al. (1990), as well as a least square fit to the 
data in squares (short dashes). 
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Figure 2: Raw determinations of the two-body correlation function $2{i), for all 10 volume-limited 
subsamples. The top panel shows the subsamples limited to 2400, 3900, 4800, and 6000 km s~-^. The 
middle one shows the 6000, 7600, 9500, and 12,000 km s~-^ subsamples, while the bottom panel displays 
the largest subsamples corresponding to 12,000, 15,000, 19,000, and 24,000 km s~-^. In each panel, the 
open squares are the correlation function determined from the smallest volume, the triangles from the 
next volume, the stars from the next volume, and the open diamonds from the largest volume. 
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As definition (2) shows, the are volume averages of the standard correlation functions. Although 
the volume averaging procedure discards all detailed geometrical information in it enhances the 
signal to noise ratio, and preserves a fundamental quantity, namely the clustering strength as a function 
of scale. Thus it allows us to explore the scaling properties of the statistics; given the data sets we have 
at present, this might be the best we can do. 

As we shall see, the sparseness of the IRAS sample prevents us from probing the deeply non-linear 
regime in detail. But we can study the critical transition region between the linear and non-linear 
regimes, over at least 2 decades in scale, and more than 3 decades in the value of the variance so 
that both domains are partly accessible. 

3.1. Volume- Averaged Correlation Function 

Fig. 2 shows the determinations of $2{i) resulting from Eq. 16, for all the volume-limited sub- 
samples of Table 1. There are systematic effects with the size of the sample. The samples drawn from 
the smallest volume do not allow any measurement of the correlations at large scales. As the sample 
volume increases, more independent structures are included in the volume, and the correlations on large 
scales increase, until they stabilize when the ratio of sample volume to i; = (47r/3)£'^ becomes large 
enough, just as expected from finite volume effects (Colombi et al. 1993a). At the smallest scales, 
^2 tends to increase with the sample volume (although no determination is possible for the largest 
volumes). This is probably evidence for a weak luminosity effect, in the sense that the more luminous 
IRAS galaxies in the larger volume-limited subsamples are slightly more clustered than less luminous 
galaxies. Davis et al. (1988) and Yahil et al. (1991) concluded that there were no gross effects as 
a function of luminosity, but their tests were somewhat less sensitive than those here. Fisher et al. 
(1993b) also concluded that the correlation function was independent of luminosity, but the smallest 
volume for which they calculated the volume-limited correlation function was to 6000 km s~-^, by which 
point the effects have largely disappeared. For each subsample, we make a least-squares fit of the points 
to the form log;^o^2 = ^2 + -B2logio-^; the results are given in Table 2 and Fig. 6 below. The effects 
we are seeing here are small, however; as Davis et al. argue, the stability of the correlation function 
derived from different volumes argues against the hypothesis that the galaxy distribution is described 
by a nai've pure fractal (e.g., Pietronero 1987; cf., the discussion in Peebles 1993). 

We average the determinations of $2{i) of Fig. 2 at each separation with equal weights, yielding 
the squares in Fig. 3; the error bars are the standard deviation from the mean. The volume averaged 
correlation function $2{i) is quite well described by a single power-law over more than 2 decades in 
scale from ^2{i) ~ 40 at £ ~ 0.5 h~-^ Mpc down to a value as small as ^2{i) ~ 0.03 at £ ~ 50 h~-^ Mpc, 
with no significant sign of a "break" or a "bump" at large scale. A least squares fit to the data yields 
(short dashes) ^{l) = A2 + B2log-^ol, with A2 = 1.17 ± 0.05 and B2 = -1.59 ± 0.06 (corresponding 
to lo = 5.44 ± 0.53 h~-^ Mpc where $2(^0) = !)• Because the points in Fig. 3 are correlated with one 
another, it is not strictly correct to make a least-square fit to ^2 (cf., the discussion in Fisher et al. 
1993b). However, the deviations from our least squares fit are reasonably Gaussian, as may be judged 
from the distribution of deviations from our power-law fit plotted in Fig. 4. The quoted error bars are 
thus representative of the dispersion of the data. 

Fisher et al. (1993b) derive the correlation function ^(s) of the full IRAS sample using the standard 
methods of pair counts. Because ^(s) is not a pure power law, we cannot use Eq. 10 to relate their 
results to ours; rather, we fit a spline to ^(s) and numerically integrate Eq. 2 to find ^2! the results are 
shown as solid triangles in the figure. The two approaches to the correlation function agree perfectly 
on large scales, but on smaller scales (r < 3 Mpc), the counts in cells analysis shows a systematically 
larger correlation function. In particular, the counts in cells analysis gives a near-perfect power law. 
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most galaxies in the sample fall into at least one of the subsamples. Two possible alternatives are 
the following: one could divide the sample into a series of shells around the observer, each considered 
as a sub-sample of a given average number density. One would then proceed to measure the counts 
in cells inside each shell, ignoring the effect of the number density gradient on the scale of the shell 
(Efstathiou et al. 1990). Alternatively, one could weight each galaxy of the full sample by the inverse 
of the selection function to obtain a constant average number density irrespective of distance to the 
observer, and proceed as in a volume-limited sample (Saunders et al. 1991). But this raises the issue of 
assessing the effect of a varying noise level on the scale of the cell for the statistics we might consider. 
In the present paper, we have taken a cautious approach; by treating galaxies of different luminosities 
separately, we avoid introducing any bias other than those that might be due to the sample selection 
alone. 



3. The Moments of the Count Distribution 

There are two approaches to measuring the correlation functions of galaxies from the counts in 
cells. The first works directly from the expressions relating the two types of statistics (cf., Eq. 1), and 
is explained in an Appendix. We find it to be of limited applicability, however, and here we relate 
the correlation function to the moments of the counts distribution. Once we know the Pn{1), we can 
compute various centered moments of the distribution 

where N = {N) = N Pn{1)- Thus N is the average density of galaxies in a subsample, times v. The 
volume- averaged correlation functions are the irreducible moments {e.g., Peebles 1980), which equal 
zero for a Gaussian distribution. They are given by 



6(^) = M2-=, 6(^) = M3-3^ + ^ , (16) 



^4(^) = /X4 - 6^ - + 11^ - ^ , (17) 



and 

6(^) = M5-10f -(10m2-4)M3 + 3oM-50^ + ^ , (18) 
TV JV J>i N N 

where we defined the average correlation functions of order N over the cell volume v = (47r/3)£'^ in 
Eq. 2. For iV = 2, 3, and 4, these approach the variance, skewness, and kurtosis of the distribution in 
the continuum limit, N ^ oo. In the following we refer to these terms in that limit. 

Note that the contribution of the densest fluctuations increases with the order, making high orders 
increasingly sensitive to rare high-density peaks. Thus, for example, ^5 will depend strongly on the 
nature of the richest clusters, which are rare enough that their number in the finite volume probed will 
be dominated by Poisson fluctuations. One can in principle correct for this if the asymptotic behavior 
of at large N is known or can be fitted for (cf., Colombi & Bouchet 1992; Colombi, Bouchet, & 
Schaeffer 1993a). However, the P^ reach their asymptotic forms only in the densely sampled regime 
(Colombi et al. 1993b), which even the smallest volume-limited subsample of IRAS galaxies does not 
approach. Thus it is quite difficult in practice to apply finite volume corrections to the moments derived 
from the counts in cells in the IRAS survey, and we do not attempt to do so here. 
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Figure 1: Measured Count probabilities Pn{^) (solid), in the 2400 km s~-^ (top), 6000 km s~-^ (middle) 
and 12,000 km s~-^ (bottom) volume-limited sub-samples, as a function of the count number N for 
different cell sizes (radii) I: from bottom to top the scales I are 0.5, 0.8, 1.3, 2.0, 3.2, 5.0, 7.9, 12.6, and 
19.9 h-'^ Mpc. The dots show a gaussian distribution with the same variance as the data. 
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Hor = ^^"^ rjT- ■ (13) 



estimated via their measured redshift z, according to 

2cz{z + 2-^0) 

(1 + z) (1 + (1 + Qozf/^) (1 - ^0 + (1 + nozY'") 

The proper distances corresponding to each outer radius are listed in Table 1. Luminosities are assigned 
to galaxies as proportional to feof^ in order to determine their membership in any given volume-limited 
sample; thus we ignore the if-correction in calculating luminosity (cf., the discussion in Fisher et al. 
1992; Fisher et al. 1993b). The results presented here were obtained with Qq = 1. Fisher et al. (1992) 
found no evidence for number density evolution in this sample; we thus ignore this possible effect. We 
adopt a value of Hq = 100 km s~-^ Mpc"-^ throughout this paper. 

Three further series of volume-limited samples were created to test the sensitivity of the counts 
to various effects. In the first, we used Eq. 13 with Qq = 0.1; this makes negligible differences in the 
results. In the second, we used Qq = 1, but placed all galaxies in the seven clusters in Table 2 of Yahil 
et al. (1991) at a common redshift to collapse the fingers of God associated with each. We show below 
the effect this has on the moments. Finally, Strauss et al. (1992a) shows that the galaxy densities in 
cores of clusters determined from IRAS galaxies are systematically lower than those determined from 
optically selected galaxies; with this in mind, we give galaxies associated with cluster cores a weight 
given by the ratio of the seventh and fifth columns of Table 2 of Strauss et al. (1992a). We refer to the 
counts derived from this analysis as the boosted counts. 

Within each volume-limited subsample, we place down 10® points at random, and count the 
number of galaxies within a series of concentric spheres around this point. We only count spheres which 
are completely included within the subsample volume, and which do not intersect the largest region of 
sky uncovered by the survey, namely that at |6| < 5°. In addition, four percent of the high-latitude 
sky is uncovered by the survey (Strauss et al. 1990); we fill these regions with random points at the 
same number density as the observed galaxies. The number of random galaxies in each subsample 
is indicated in the last column of Table 1. The large number of randomly placed spheres makes the 
measurement eiTOT of the counts negligible. The statistical eiiois due to the finite volume of the samples 
is non-negligible, however, and will be discussed further in the next section. 

Fig. 1 shows the resulting counts in cells for the subsamples volume-limited to 2400 km s~-^ (top), 
6000 km s~-^ (middle), and 12,000 km s~-^ (bottom). If the galaxies are unclustered, the are given 
by the Poisson distribution, Eq. 5. We do not show this case in the figure to reduce the clutter; suffice 
it to say that the curves deviate strongly from this limit. A more interesting case is to assume that the 
second moment of the distribution describes it in full. In the limit that the higher-order moments are 
negligible, we might approximate the as a Gaussian: 

1 [ (N -wy 

Pn = ^ exp =5 

(2 7rivV2)^/2 [ 2N^fi2 

where /X2 is the second moment of the counts, given in Eq. 15 below. This Gaussian expression is shown 
as dots in the figure. Note that Eq. 14 expression is not that one would get by assuming that the 
are negligible for iV > 2 in Eq. 1. In any case, the figure shows that Eq. 14 is a poor fit to the 
data. Notice however, that for the larger scales (the curves which peak at larger N in the figure), the 
relative importance of the higher-order correlations drops, and the Gaussian expression becomes a better 
approximation. Moreover, in the sparsely sampled limit (lowest panel of the figure), the higher-order 
correlations become less important, and again the Gaussian approximation becomes better. 

Our volume-limiting strategy means that a fraction of the galaxy data is unused (compare the 
numbers in Table 1 with the 5304 galaxies of the full survey), which is of course not optimal. However, 



(14) 
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Bouchet, & Colombi (1993a) show how the convolution of the density field with a Gaussian or spherical 
top-hat window (appropriate for the count s-in- cells analyses we use here) introduces a dependence of 
^3 on the local slope of the power spectrum. They further show how the comparison of the variance 
and skewness of the galaxy distribution can be used to distinguish between models with Gaussian 
distributed initial density perturbations, and models with exotic seeds such as textures, for which one 

— — 3/2 

would generically expect ^3 oc ^2 • An introduction to these latter topics may be found in Juszkiewicz 
& Bouchet (1992). Of course, there is no a priori reason to believe that the correlation hierarchy 
established on a given scale during the mildly non-linear stages of galaxy evolution will survive the later 
strongly non-linear stages; in addition, one might expect mode coupling between small and large scales 
(cf.. Little, Weinberg, & Park 1991). However, results of numerical simulations by Bouchet & Hernquist 
(1992), Weinberg & Cole (1992), Lahav et al. (1993), and Fry, Melott, & Shandarin (1993) have indeed 
shown that measurements at large scale obey the perturbative theory, even when the systems have 
reached full non-linearity at small scales. The self-similar BBGKY solutions imply that scale invariance 
can be generated on very non-linear scales, but there is no a priori connection between the values of 
Sm determined on large and small scales, and there is no reason to assume that they are the same. 

In this paper, we examine the nature of the P^r as derived from volume-limited subsamples of a 
redshift survey of galaxies detected by the Infrared Astronomical Satellite {IRAS). The sample consists 
of 5304 galaxies with 60 micron flux density above 1.2 Jy, selected over 87.6% of the sky. The selection 
criteria for the galaxies are given in Strauss et al. (1990) and Fisher (1992), and the data for the brighter 
half of the sample are given in Strauss et al. (19926). IRAS galaxies are a dilute tracer of the galaxian 
density field (Strauss et al. 1992a), with typically 1/3 the number density of galaxies appearing in 
optically selected samples of comparable depth. Thus this paper emphasizes those properties of the 
counts distribution that are accessible in the low-density limit; the large volume covered by our sample 
allows many independent volumes of a given size at a given number density to be probed. In particular, 
we explore the relationships between the counts in cells and the correlation functions, and are able only 
to start to probe the asymptotic forms that the count distributions take in the limit of large density 
(cf. BSD, and Colombi, Bouchet & Schaeffer 1993b). A preliminary version of this work was presented 
in Bouchet, Davis, & Strauss (1992). 

The outline of the paper is as follows. In § 2, we discuss the measurement of from the IRAS 
data. In § 3 the various moments of the counts distribution are derived from the P^. The correlation 
functions are presented in § 3.1. In § 3.2, we explore the relationship between correlation functions 
of different order. The scaling of the void probability function with density is shown in § 4, and the P^ 
are compared with various scale-invariant models. We discuss the results and summarize in § 5. 

2. The Sample and its Analysis 

We select from the redshift sample ten volume-limited subsamples, each containing roughly twice 
the volume of the previous one. The outer radius, number of galaxies included, and the corresponding 
minimum luminosity of each subsample are given in Table 1. Observed heliocentric redshifts are cor- 
rected to the barycenter of the Local Group using the correction of Yahil, Tammann & Sandage (1977). 
No correction is made to the redshifts for peculiar velocities. Our rationale for this is two-fold: Bouchet 
et al. (1992b) tell us that the value of ^3 is insensitive to peculiar velocities in second-order perturbation 
theory, and our method for self-consistently correcting galaxy redshifts for peculiar velocities (Yahil et 
al. 1991) is unable to properly model small-scale features in the velocity field (Davis, Strauss, & Yahil 
1991), which could cause unknown effects on the counts in cells analysis. Thus proper distances r are 
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where a quantifies the deviation of Pq from the Poisson prediction (Eq. 3 above), then ct is a unique 
function of the quantity Nc, given by 

N, = N'^{v) . (9) 
If ^2 is a power-law of index 7, one has (Peebles & Groth 1976) 

6 = "^2 6 = 7^ TT-j TTTT , (10) 

(3-7)(4-7)(6-7)2T 

while the average number of neighbors of a galaxy in a sphere of radius r, minus the number in a 
homogeneous universe, is 

Nciuster =n r $2 d^r = 3iV6/(3 - 7) , (H) 
^0 

or 

24 

= Ta ^7^ V7;zNcluster ~ ^.UNduster for 7 = 1.8 . (12) 

(4-7)(6-7)2T 

Thus Nc{l) is an indicator of the typical number of clustered particles on the scale I. 

If scale invariance (Eq. 6) holds, Eqs. (1) and (4) imply a number of scaling relations of the count 
probability distributions (Balian & SchaefFer 1989a) which involve two universal functions governing the 
behavior of the P^- A particularly clear and succinct summary of these relations is given in Bouchet, 
SchaefFer, & Davis (1991; hereafter BSD), who find that they are well-satisfied in simulations of a 
universe dominated by Cold Dark Matter. Bouchet & Hernquist (1992) find similar results for a white 
noise initial spectrum, although numerical limitations do not allow them to make as firm a statement for 
Hot Dark Matter. In any case, these studies seem to indicate that scale invariance is a generic result of 
the gravitational instability process applied to gravitating matter, when density contrasts become large. 
Observationally, the scale-invariant predictions have been found to hold for the galaxy distribution in the 
Center for Astrophysics redshift survey (CfA) (Alimi, Blanchard, & SchaefFer 1990) and the Southern 
Sky Redshift Survey (SSRS) (Maurogordato, SchaefFer, & da Costa 1992) (see also Gaztahaga 1992). 
Unfortunately, the limited volume of these surveys has not allowed all aspects of the scale-invariant 
predictions to be tested over a large range in ^2- In particular, as we will see below, the most stringent 
tests of the scale-invariant hypothesis occur at the highest sampling densities, which existing redshift 
surveys probe in only very small volumes. 

There is not as yet any dynamical theory explaining the onset of scale invariance in the fully 
non-linear regime, although self-similar solutions of the BBGKY hierarchy do exist (Davis & Peebles 
1977). However, when density contrasts are weak, one can use a perturbative approach to study the 
early stages of the process. The statistical properties of a Gaussian random density field, (i.e., one 
in which the phases of its difFerent Fourier modes are uncorrelated), are completely described by the 
power spectrum, which is the Fourier conjugate of the two-point correlation function; all higher-order 
(reduced) correlations are zero. In linear theory, the growth of density perturbations under gravitational 
instability preserves the Gaussian nature of the density field. However, it was recognized long ago that 
even when density contrasts are small, non-linearities induce deviations from Gaussianity. Bernardeau 
(1992) has shown that in the weakly non-linear regime (i.e., as long as ^2 ^ 1)) gravity applied to an 

— AT— 1 

initially Gaussian density field induces the correlation hierarchy = Sn^2 for all iV > 1, where 
the Sn are independent both of scale and of the initial power spectrum of density fluctuations. This 
generalizes to arbitrary N the result already obtained for ^3 by Peebles (1980), for ^4 by Fry (1984), and 
for 55 by GorofF et al. (1986). Bouchet et al. (1992b) have furthermore shown analytically that, in this 
weakly non-linear regime, ^3 is insensitive to the value of the density parameter Qq, and is only weakly 
afFected by the mapping from real space to redshift space, although both ^2 and ^3 change. Juszkiewicz, 
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point correlation function. However, there may exist scaling relations, which we present below, which 
allow us to express the P^r in density-independent ways. 

The early work on counts in cells is summarized by Peebles (1980), where its connection to the 
correlation function is made (cf.. White 1979). In particular, the void probability function can be related 
to an infinite sum of the iV-point correlation functions 



Po{l) = exp 



.Ar=i 



(1) 



where N is the expected number of galaxies in the absence of clustering in the volume v = (47r/3)£'^, 
and $n{v) is the volume average of the correlation function: 

I r 

iN{y) = ^ I d^rid^r2...d^VNiN{jci,V2,...,VN) ■ (2) 

J V 

Note that in an unclustered universe, = for N > 1), the void probability function is given by the 
Poisson expression 

Po(^) = exp(-iV) . (3) 

The probability that a cell of volume v contains N galaxies, Pn{1), is directly related to Pq: the void 
probability function clearly depends on the average density of galaxies n = N /v, and one can derive 
from Eq. 1: 

_ i-n)^ d^Po{n,l) 

^^W-^^n ■ ^ ' 



For example, inserting Eq. 3 in Eq. 4 yields 



PN{l) = j^e-^ , (5) 

which is just the Poisson distribution, as it should be in the unclustered limit. 

Based on early results for the three- and four-point correlation functions in the non-linear regime, a 
number of workers hypothesized the existence of a scaling hierarchy, in which the high-order correlations 
could be expressed as symmetrized sums of lower-order correlation functions (cf.. Fry & Peebles 1978; 
Fry 1984; Schaeffer 1984; Sharp, Bonometto, & Lucchin 1984). The subject reached maturity with 
the papers of Balian & Schaeffer (1988; 1989a; 1989b) who start with a generic assumption of scale 
invariance for the correlation function hierarchy: 

^Ar(Ari,...ArAr) = A-^(^-i)^Ar(ri,--TAr) , (6) 

for any value of iV over some (large) range of scales. This happens, for instance, if the are proportional 
to a symmetric product of iV — 1 two-point correlation functions, as is suggested by the observed form of 
the three- and four-point correlation functions in the non-linear regime (^2 ~ !)• In any case, it implies 
that the volume- averaged correlation functions satisfy: 

M^) = 5Ar6'^"'(^) , (7) 

where the Sn are independent of scale (although different scale invariant systems or different scale 
ranges may have different sets of values of the Sn)- An immediate consequence of this follows by 
inserting Eq. 7 into Eq. 1: if we write 

Po{l) = exp[-Na{n,v)] , (8) 
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Abstract 



We have measured the count probability distribution function (CPDF) in a series of 10 volume-limited 
sub-samples of a deep redshift survey oi IRAS galaxies. The CPDF deviates significantly from both the 
Poisson and Gaussian limits in all but the largest volumes. We derive the volume- averaged 2, 3, 4, and 
5-point correlation functions from the moments of the CPDF, and find them all to be reasonably well- 
described by power laws. Weak systematic effects with the sample size provide evidence for stronger 
clustering of galaxies of higher luminosity on small scales. Nevertheless, remarkably tight relationships 
hold between the correlation functions of different order. In particular, the "normalized" skewness 
defined by the ratio ^3 = ^3/^2 varies at most weakly with scale in the range 0.1 < ^2 < 10. That is, 
S3 is close to constant (= 1.5 ± 0.5) from weakly to strongly non-linear scales. On small scales, this is 
consistent with previous determinations of the three-point correlation function ( = ^3. On larger scales, 
this conforms with the hypothesis of the growth of observed structures by gravitational clustering of 
initially Gaussian density fluctuations. We similarly find that ^4 is proportional to the third power of 
^2 in the same range of ^2) and there is weak evidence that ^5 is proportional to the fourth power of 
^2- Furthermore, we find that the void probability function obeys a scaling relation with density to 
great precision, in accord with the scale-invariance hypothesis ex ^2 )• Double-counting cluster 
galaxies in order to match the cluster overdensities seen in optically selected samples of galaxies increases 
greatly the derived value of ^3 and ^4, although the scaling between the the correlations of different 
orders remains. Unfortunately, the relative sparseness of the IRAS sample preclude using it to make 
the most demanding tests of scale invariance, which rely on the overall shape of the CPDF at different 
scales. In this sparse limit, various models for the CPDF become degenerate, and fit the IRAS data 
nearly equally well. Indeed, the CPDF is well fitted by both the negative binomial distribution, and the 
thermodynamical model of Saslaw and Hamilton, and to a somewhat lesser extent by the log-normal 
distribution. All three models fit the data poorly for the densest subsample of IRAS galaxies examined, 
but this may be more a reflection of finite volume effects than of the inadequacy of the models. 

1. Introduction 

The two-point correlation function of the galaxy distribution, and its Fourier transform, the 
power spectrum, have long been the principle statistical tools by which astronomers have quantified 
the clustering of galaxies (Peebles 1980 and references therein). However, as redshift surveys have 
revealed ever more complex structures in the distribution of galaxies, the need for statistics which 
illuminate other aspects of the galaxy distribution has become acute. For example, the large voids 
recently discovered in the galaxy distribution (de Lapparent, Geller, & Huchra 1986; Kirshner et al. 
1987; Geller & Huchra 1989) are not mirrored by any feature in the two-point correlation function, 
and their description requires a rather different statistic. On small scales ( ^ 500 km s~-^), one observes 
clusters of galaxies which represent enormous overdensities, and whose properties are only completely 
described by the full complement of iV-point correlation functions up to values of N equal to the number 
of galaxies in the cluster. In practice, standard techniques have great difficulty measuring correlation 
functions of order four and higher from finite samples (Peebles 1980). Another approach is that taken 
by Szapudi, Szalay, & Boshan (1992) and Meiksin, Szapudi, & Szalay (1992) (see also Szapudi & Szalay 
1993), who relate counts in cells to higher-order correlations, enabling them to go to iV = 8 in the 
angular correlation function. 

In this paper, we examine various properties of the counts of galaxies in cells of a given size I. 
In particular, we define the quantity PN{i) as the fraction of randomly positioned spheres of radius I 
containing exactly N galaxies, for a given galaxy sample. As is clear from the definition, depends 
strongly on the mean density of galaxies in a sample, a property which it does not share with the iV- 



2 



Moments of the Counts Distribution 
in the 1.2 Jy IRAS Galaxy Redshift Survey ^ 



Francois R. Bouchet 

Institut d'Astrophysique de Paris, CNRS, 
98 his Boulevard Arago, F-75014 Paris, FRANCE 



Michael A. 

Institute for Advanced Study, 
Princeton, New 



Strauss 

School of Natural Sciences 
Jersey 08540 



Marc Davis 

Astronomy and Physics Departments, University of California 
Berkeley, California 94720 

Karl B. Fisher 

Institute of Astronomy, Madingley Rd., Cambridge CBS OHA, England 

Amos Yahil 

Astronomy Program, State University of New York 
ESS Building, Stony Brook, New York 11194-2100 



John P. Huchra 

Harvard- Smithsonian Center for Astrophysics, 60 Garden Street 
Cambridge, Massachusetts 02138 



Based in part on data obtained at Lick Observatory, operated by the University of California; the Multiple Mirror 
Telescope, a joint facility of the Smithsonian Astrophysical Observatory and the University of Arizona; and Cerro Tololo 
Inter- American Observatory; operated by the Association of Universities for Research in Astronomy, Inc., under contract 
with the National Science Foundation. 



1 



